Multi-view data containing complementary and consensus information can facilitate representation learning by exploiting the intact integration of multi-view features. Because most objects in real world often have underlying connections, organizing multi-view data as heterogeneous graphs is beneficial to extracting latent information among different objects. Due to the powerful capability to gather information of neighborhood nodes, in this paper, we apply Graph Convolutional Network (GCN) to cope with heterogeneous-graph data originating from multi-view data, which is still under-explored in the field of GCN. In order to improve the quality of network topology and alleviate the interference of noises yielded by graph fusion, some methods undertake sorting operations before the graph convolution procedure. These GCN-based methods generally sort and select the most confident neighborhood nodes for each vertex, such as picking the top-k nodes according to pre-defined confidence values. Nonetheless, this is problematic due to the non-differentiable sorting operators and inflexible graph embedding learning, which may result in blocked gradient computations and undesired performance. To cope with these issues, we propose a joint framework dubbed Multi-view Graph Convolutional Network with Differentiable Node Selection (MGCN-DNS), which is constituted of an adaptive graph fusion layer, a graph learning module and a differentiable node selection schema. MGCN-DNS accepts multi-channel graph-structural data as inputs and aims to learn more robust graph fusion through a differentiable neural network. The effectiveness of the proposed method is verified by rigorous comparisons with considerable state-of-the-art approaches in terms of multi-view semi-supervised classification tasks.
translated by 谷歌翻译
A recent trojan attack on deep neural network (DNN) models is one insidious variant of data poisoning attacks. Trojan attacks exploit an effective backdoor created in a DNN model by leveraging the difficulty in interpretability of the learned model to misclassify any inputs signed with the attacker's chosen trojan trigger. Since the trojan trigger is a secret guarded and exploited by the attacker, detecting such trojan inputs is a challenge, especially at run-time when models are in active operation. This work builds STRong Intentional Perturbation (STRIP) based run-time trojan attack detection system and focuses on vision system. We intentionally perturb the incoming input, for instance by superimposing various image patterns, and observe the randomness of predicted classes for perturbed inputs from a given deployed model-malicious or benign. A low entropy in predicted classes violates the input-dependence property of a benign model and implies the presence of a malicious input-a characteristic of a trojaned input. The high efficacy of our method is validated through case studies on three popular and contrasting datasets: MNIST, CIFAR10 and GTSRB. We achieve an overall false acceptance rate (FAR) of less than 1%, given a preset false rejection rate (FRR) of 1%, for different types of triggers. Using CIFAR10 and GTSRB, we have empirically achieved result of 0% for both FRR and FAR. We have also evaluated STRIP robustness against a number of trojan attack variants and adaptive attacks.
translated by 谷歌翻译
With the demand for standardized large-scale livestock farming and the development of artificial intelligence technology, a lot of research in area of animal face recognition were carried on pigs, cattle, sheep and other livestock. Face recognition consists of three sub-task: face detection, face normalizing and face identification. Most of animal face recognition study focuses on face detection and face identification. Animals are often uncooperative when taking photos, so the collected animal face images are often in arbitrary directions. The use of non-standard images may significantly reduce the performance of face recognition system. However, there is no study on normalizing of the animal face image with arbitrary directions. In this study, we developed a light-weight angle detection and region-based convolutional network (LAD-RCNN) containing a new rotation angle coding method that can detect the rotation angle and the location of animal face in one-stage. LAD-RCNN has a frame rate of 72.74 FPS (including all steps) on a single GeForce RTX 2080 Ti GPU. LAD-RCNN has been evaluated on multiple dataset including goat dataset and gaot infrared image. Evaluation result show that the AP of face detection was more than 95% and the deviation between the detected rotation angle and the ground-truth rotation angle were less than 0.036 (i.e. 6.48{\deg}) on all the test dataset. This shows that LAD-RCNN has excellent performance on livestock face and its direction detection, and therefore it is very suitable for livestock face detection and Normalizing. Code is available at https://github.com/SheepBreedingLab-HZAU/LAD-RCNN/
translated by 谷歌翻译
轴承诊断对于降低旋转机器的损害风险并进一步改善经济利润至关重要。最近,以深度学习为代表的机器学习在轴承诊断方面取得了巨大进展。但是,将深度学习应用到这样的任务仍然面临一个主要问题。众所周知,深层网络是黑匣子。很难知道模型如何分类分类背后的正常原理和物理原理的错误信号。为了解决可解释性问题,首先,我们原型是一个具有最近发明的二次神经元的卷积网络。由于二次神经元的特征表示能力,这种二次神经元授权网络可以鉴定噪声轴承数据。此外,我们通过将学到的二次功能分解为类似于注意力的二次神经元(称为Qttention)的注意机制独立得出了注意力机制,从而使模型具有固有解释的二次神经元。公众和我们的数据集进行的实验表明,提出的网络可以促进有效且可解释的轴承故障诊断。
translated by 谷歌翻译
基于文本的人员搜索旨在通过文本描述检索某个行人的图像。此任务的关键挑战是消除模态间隙,并在模态中实现特征对齐。在本文中,我们提出了一种用于基于文本的人员搜索的语义对齐方法,其中通过自动学习语义对齐的视觉特征和文本特征来实现模态的特征对齐。首先,我们介绍了两个变换器的骨干,以编码图像和文本的强大特征表示。其次,我们设计了一个语义对齐的特征聚合网络,以便自适应地选择和聚合具有相同语义的特征,进入部分感知功能,该功能是通过跨模型部分对齐损耗和分集丢失约束的多头注意模块实现的。Cuhk-Pedes和Flickr30K数据集上的实验结果表明,我们的方法实现了最先进的表演。
translated by 谷歌翻译
图表的稀疏表示已经提出了加速传统计算架构(CPU,GPU或TPU)上的图形应用程序(例如社交网络,知识图表)计算的巨大潜力。但是探索计算内存(PIM)平台上的大规模稀疏图计算(通常具有忆内横梁)仍处于起步阶段。当我们期望在Memristive Crossbars上实现大规模或批量图的计算或存储时,自然假设是我们需要大规模的横梁,但利用率低。一些最近的作品已经质疑这种假设,以避免通过“块分区”浪费存储和计算资源,这是固定尺寸的,逐渐预定的或粗粒,因此在我们的观点中没有有效地稀疏。该工作提出了动态稀疏感知映射方案,其将问题模拟作为通过加强学习(RL)算法(R1)算法解决的顺序决策问题。我们的生成模型(LSTM,与我们的动态填充机制相结合)在小规模的典型图形/矩阵数据(具有完全映射的原始矩阵的43%面积)上产生显着的映射性能,以及两个大规模矩阵数据(22.5 QH882的%面积,QH1484上的17.1%面积)。此外,我们该方案的编码框架是直观的,并且对部署或编译系统具有有希望的适应性。
translated by 谷歌翻译
可微分的架构搜索逐渐成为神经结构中的主流研究主题,以实现与早期NAS(基于EA的RL的)方法相比提高效率的能力。最近的可分辨率NAS还旨在进一步提高搜索效率,降低GPU记忆消耗,并解决“深度间隙”问题。然而,这些方法不再能够解决非微弱目标,更不用说多目标,例如性能,鲁棒性,效率和其他指标。我们提出了一个端到端的架构搜索框架,朝向非微弱的目标TND-NAS,具有在多目标NAs(MNA)中的不同NAS框架中的高效率的优点和兼容性的兼容性(MNA)。在可分辨率的NAS框架下,随着搜索空间的连续放松,TND-NAS具有在离散空间中优化的架构参数($ \ alpha $),同时通过$ \ alpha $逐步缩小超缩小的搜索策略。我们的代表性实验需要两个目标(参数,准确性),例如,我们在CIFAR10上实现了一系列高性能紧凑型架构(1.09米/ 3.3%,2.4M / 2.95%,9.57M / 2.54%)和CIFAR100(2.46 M / 18.3%,5.46 / 16.73%,12.88 / 15.20%)数据集。有利地,在现实世界的情景下(资源受限,平台专用),TND-NA可以方便地达到Pareto-Optimal解决方案。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译